EXPLORATORY DATA ANALYSIS ON TERRORISM ¶

Godspower Uyanga(Data Scientist/Machine Learning Engineer)¶

AIM And Tool USed For Analysis

¶

This Work Focuses On Analyzing And Finding Out The Major Hot Zone Of World Terrorism , We Will Find And Discover Important Insights That Will Help Each Country Of The World Make Best Life Decisions And Know Best Security Actions To Take In Other To Guide Their Citizens .
Will Be USing Python Programming With The Following Important FrameWorks Pandas,Matplotlib,Numpy,Seaborn For Data Analysis And We Will Create Important Visuals For Best Analytic Insights And Story Telling

Importing Important Dependencies For Analysis¶

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

Data Collection¶

Loading The Global Terrorism Data For Analysis¶

In [2]:
terrorism_data = pd.read_csv("global terrorism.csv", encoding = "latin-1")
C:\Users\GODSPOWER UYANGA\AppData\Local\Temp\ipykernel_7528\2050754432.py:1: DtypeWarning: Columns (4,6,31,33,61,62,63,76,79,90,92,94,96,114,115,121) have mixed types. Specify dtype option on import or set low_memory=False.
  terrorism_data = pd.read_csv("global terrorism.csv", encoding = "latin-1")

Perusing The First 5 Rows Of Our Data¶

In [3]:
terrorism_data.head()
Out[3]:
eventid iyear imonth iday approxdate extended resolution country country_txt region ... addnotes scite1 scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
0 197000000001 1970 7 2 NaN 0 NaN 58 Dominican Republic 2 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
1 197000000002 1970 0 0 NaN 0 NaN 130 Mexico 1 ... NaN NaN NaN NaN PGIS 0 1 1 1 NaN
2 197001000001 1970 1 0 NaN 0 NaN 160 Philippines 5 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
3 197001000002 1970 1 0 NaN 0 NaN 78 Greece 8 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
4 197001000003 1970 1 0 NaN 0 NaN 101 Japan 4 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN

5 rows × 135 columns

Check The Shape Of The Data¶

In [4]:
print(f"We Have A Total Of {terrorism_data.shape} Rows And Columns In The Data")
We Have A Total Of (181691, 135) Rows And Columns In The Data

We Have A Total Of 181691 Rows And 135 Columns In The Data, Its A Very Large Data¶

Exploratory Data Analysis And Data Preprocessing¶

Since We Have 181691 Rows And 135 Columns, They Have A Large Proportion On The Dataset, Working With All Will Not Make Any Sense On Our Analysis, So we Have To Rename The Columns And Extract The Significant Columns For Better Analysis¶

Exploring The Columns¶

In [5]:
terrorism_data.columns
Out[5]:
Index(['eventid', 'iyear', 'imonth', 'iday', 'approxdate', 'extended',
       'resolution', 'country', 'country_txt', 'region',
       ...
       'addnotes', 'scite1', 'scite2', 'scite3', 'dbsource', 'INT_LOG',
       'INT_IDEO', 'INT_MISC', 'INT_ANY', 'related'],
      dtype='object', length=135)

Renaming And Extracting Important Features For Analysis¶

In [6]:
terrorism_data.rename(columns={"iyear":"Year","imonth":"Month","iday":"Day","country_text":"Country","provstate":"State","region_txt":"Region","attacktype1_txt":"AttackType","target1":"Target","nkill":"Killed","nwound":"Wounded","summary":"Summary","gname":"Group","targtype1_txt":"Target_Type","weaptype1_txt":"Weapon_Type","motive":"Motive"}, inplace = True)
In [7]:
terrorism_data.head()
Out[7]:
eventid Year Month Day approxdate extended resolution country country_txt region ... addnotes scite1 scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
0 197000000001 1970 7 2 NaN 0 NaN 58 Dominican Republic 2 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
1 197000000002 1970 0 0 NaN 0 NaN 130 Mexico 1 ... NaN NaN NaN NaN PGIS 0 1 1 1 NaN
2 197001000001 1970 1 0 NaN 0 NaN 160 Philippines 5 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
3 197001000002 1970 1 0 NaN 0 NaN 78 Greece 8 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
4 197001000003 1970 1 0 NaN 0 NaN 101 Japan 4 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN

5 rows × 135 columns

Extracting The New Columns¶

In [8]:
terrorism_data.rename(columns={"iyear":"Year", "imonth":"Month","iday":"Day","country_txt":"Country","provstate":"State","region_txt":"Region","attacktype1_txt":"AttackType","target1":"Target","nkill":"Killed","nwound":"Wound","summary":"Summary","gname":"Group","targtype1_txt":"Target_type","weaptype1_txt":"Weapon_Type","motive":"Motive"}, inplace = True)
In [9]:
terrorism_data = terrorism_data[["Year","Month","Day","Country","State","Region","AttackType","Target","Killed","Wounded","Summary","Group","Target_Type","Weapon_Type","Motive"]]

Lets View Our Renamed Terrorism Data¶

In [10]:
terrorism_data.head()
Out[10]:
Year Month Day Country State Region AttackType Target Killed Wounded Summary Group Target_Type Weapon_Type Motive
0 1970 7 2 Dominican Republic NaN Central America & Caribbean Assassination Julio Guzman 1.0 0.0 NaN MANO-D Private Citizens & Property Unknown NaN
1 1970 0 0 Mexico Federal North America Hostage Taking (Kidnapping) Nadine Chaval, daughter 0.0 0.0 NaN 23rd of September Communist League Government (Diplomatic) Unknown NaN
2 1970 1 0 Philippines Tarlac Southeast Asia Assassination Employee 1.0 0.0 NaN Unknown Journalists & Media Unknown NaN
3 1970 1 0 Greece Attica Western Europe Bombing/Explosion U.S. Embassy NaN NaN NaN Unknown Government (Diplomatic) Explosives NaN
4 1970 1 0 Japan Fukouka East Asia Facility/Infrastructure Attack U.S. Consulate NaN NaN NaN Unknown Government (Diplomatic) Incendiary NaN

Statistical Description Of The Data¶

In [11]:
terrorism_data.describe()
Out[11]:
Year Month Day Killed Wounded
count 181691.000000 181691.000000 181691.000000 171378.000000 165380.000000
mean 2002.638997 6.467277 15.505644 2.403272 3.167668
std 13.259430 3.388303 8.814045 11.545741 35.949392
min 1970.000000 0.000000 0.000000 0.000000 0.000000
25% 1991.000000 4.000000 8.000000 0.000000 0.000000
50% 2009.000000 6.000000 15.000000 0.000000 0.000000
75% 2014.000000 9.000000 23.000000 2.000000 2.000000
max 2017.000000 12.000000 31.000000 1570.000000 8191.000000
In [12]:
terrorism_data.isnull().sum()
Out[12]:
Year                0
Month               0
Day                 0
Country             0
State             421
Region              0
AttackType          0
Target            636
Killed          10313
Wounded         16311
Summary         66129
Group               0
Target_Type         0
Weapon_Type         0
Motive         131130
dtype: int64

Working On Null Value¶

In [13]:
terrorism_data.fillna(0, inplace = True)
C:\Users\GODSPOWER UYANGA\AppData\Local\Temp\ipykernel_7528\3267041046.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  terrorism_data.fillna(0, inplace = True)

Check If They Is An Hidden Null Value¶

In [14]:
terrorism_data.isnull().sum()
Out[14]:
Year           0
Month          0
Day            0
Country        0
State          0
Region         0
AttackType     0
Target         0
Killed         0
Wounded        0
Summary        0
Group          0
Target_Type    0
Weapon_Type    0
Motive         0
dtype: int64

HeatMap To Show Correlation Analysis¶

In [66]:
print("------------------   The Heat Map Below Shows The Correlation Of Our  Data -------------\n")

sns.heatmap(terrorism_data.corr(), annot =True, linewidths=.10)
plt.rcParams['font.size'] = 40
plt.rcParams['figure.dpi'] = 250
plt.rcParams['figure.figsize'] = (15,5)
plt.show()
------------------   The Heat Map Below Shows The Correlation Of Our  Data -------------

Visualizing The Distribution Of Each Data In Our DataSet¶

In [16]:
terrorism_data.hist(figsize = (50,25), color = "lightblue", ec="red", lw = 10);

Global Terrorist Activities By Region In Each Year Using Area Plot¶

In [82]:
pd.crosstab(terrorism_data.Year, terrorism_data.Region).plot(kind ="area", figsize =(50,25));
plt.title("GLOBAL TERRORIST ACTIVITIES BY REGION IN EACH YEAR USING AREA PLOT TO VISUALIZE");
plt.ylabel("NUMBER OF GLOBAL ATTACKS");
In [18]:
plt.subplots(figsize = (50,25));
sns.countplot("Year", data = terrorism_data, palette = "RdYlGn_r", edgecolor =sns.color_palette("husl",20));
plt.xticks(rotation =90);
plt.title("NUMBER OF GLOBAL TERRORIST ACTIVITIES THAT HAPPEN EACH YEAR");
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(

The Largest Number Of Terrorist Activities Happened In 2014, While The Least Number Of Terrorist Activities Happened In 1971¶

Let's Find The Number Of Attacks That Occured From 1970 To 2017 And The Percentage Increase Of The Global Attack¶

In [19]:
print(".................... CHECK THIS .........................................\n")
Year = terrorism_data.Year.value_counts().to_dict()
rate = ((Year[2017]-Year[1970])/Year[2017]) * 100
print("The Total Number Of",Year[1970],"Global Attacks Happened In 1970 And","The Total Number Of",Year[2017],"Global Attacks Happened In 2017")
print(".........................................................................\n")
print("The Total Number Of This Global Attacks From 1970 Has Increased By",np.round(rate,0),"% Till 2017")
.................... CHECK THIS .........................................

The Total Number Of 651 Global Attacks Happened In 1970 And The Total Number Of 10900 Global Attacks Happened In 2017
.........................................................................

The Total Number Of This Global Attacks From 1970 Has Increased By 94.0 % Till 2017

Lets Check The Method Of These Global Attack¶

In [84]:
plt.figure(figsize=(50,25))
sns.countplot(terrorism_data["AttackType"],order=terrorism_data["AttackType"].value_counts().index, palette = "hot");
plt.xticks(rotation = 90);
plt.xlabel("METHODS OF GLOBAL ATTACK");
plt.ylabel("METHODS RATING");
plt.title("TERRORIST GROUPS AND THEIR METHOD OF ATTACK", color="red");
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning:

Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.

Notice That Bombing And Explosion Was the Method Mostly Used By This Terorrist Group For Attack¶

NUMBER OF CASUALTIES THAT CORRESPOND TO THE NUMBER OF PEOPLE KILLED EACH YEAR IN EACH COUNTRY OF THE WORLD¶

In [21]:
px.scatter(terrorism_data,terrorism_data.Wounded,terrorism_data.Killed, hover_name ="Country",animation_frame="Year",animation_group="Country",color="AttackType",range_color=[0,1],labels={"Killed":"Death Victims","Wounded":"Casualties"},title="Number Of Casualties VS Killed People In Each Country For Each Year")

VISUALIZING TO DRAW INSIGHT ABOUT TERRORIST MAJOR TARGET¶

In [26]:
plt.figure(figsize = (50,25))
sns.countplot(terrorism_data['Target_Type'], order=terrorism_data["Target_Type"].value_counts().index, palette ="magma")
plt.xticks(rotation = 90)
plt.xlabel("TYPE")
plt.title("TERRORIST MAJOR TARGET")
plt.show()
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning:

Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.

Notice That The Terrorist Groups Majorly Target Private Citizens and Property¶

Now Lets Check The Terrorist Groups And Their Operations In Each Country¶

In [34]:
terrorism_data["Group"].value_counts()[1:20].values
Out[34]:
array([7478, 5613, 4555, 3351, 3288, 2772, 2671, 2487, 2418, 2310, 2024,
       1878, 1630, 1606, 1561, 1351, 1125, 1062, 1020], dtype=int64)
In [41]:
plt.subplots(figsize = (50,25))
sns.barplot(y=terrorism_data["Group"].value_counts()[1:20].index, x = terrorism_data["Group"].value_counts()[1:20].values, palette="copper")
plt.title("THE MOST ACTIVE GLOBAL TERRORIST GROUPS OR ORGANIZATIONS", color="red")
plt.show()

Notice That The Most Active Global Terrorist Organization Is Taliban¶

Discovering The Total Number Of Terrorist Attack In Each Country And Regions¶

In [42]:
fig, axes = plt.subplots(figsize=(50,25), nrows =1, ncols = 2)
sns.barplot(x=terrorism_data["Country"].value_counts()[:20].values, y=terrorism_data["Country"].value_counts()[:20].index, ax=axes[0],palette="magma")
axes[0].set_title("TERRORIST ATTACK PER COUNTRY")
sns.barplot(x=terrorism_data["Region"].value_counts().values, y = terrorism_data["Region"].value_counts().index, ax=axes[1])
axes[1].set_title("TERRORIST ATTACKS PER REGION")
fig.tight_layout()
plt.show()

Notice That Iraq Has The Highest Terrorist Attack, The Attack Is Severe In Middle East And North African Region ¶

Lets Peruse The Total Terrorist Attack That Occured In Nigeria¶

In [61]:
terrorism_data["Country"].value_counts()[11:12]
Out[61]:
Nigeria    3907
Name: Country, dtype: int64

We have The Total Of 3907 Attacks In Nigeria¶

In [68]:
terrorism_data["Group"][11:12].value_counts()
Out[68]:
Left-Wing Militants    1
Name: Group, dtype: int64

TOTAL NUMBER OF ATTACK IN EACH COUNTRY USING GLOBE¶

In [74]:
global_terrorism = terrorism_data.groupby(["Country"], as_index = False).count()
In [75]:
global_terrorism.head()
Out[75]:
Country Year Month Day State Region AttackType Target Killed Wounded Summary Group Target_Type Weapon_Type Motive
0 Afghanistan 12731 12731 12731 12731 12731 12731 12731 12731 12731 12731 12731 12731 12731 12731
1 Albania 80 80 80 80 80 80 80 80 80 80 80 80 80 80
2 Algeria 2743 2743 2743 2743 2743 2743 2743 2743 2743 2743 2743 2743 2743 2743
3 Andorra 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4 Angola 499 499 499 499 499 499 499 499 499 499 499 499 499 499

Visual That Shows The Total Terrorist Attacks¶

In [77]:
fig = px.choropleth(global_terrorism, locations = "Country", locationmode = "country names",color="Year",hover_name ="Country",projection="orthographic", title="Total Number Of Attacks(1970-2017)",labels = {"Year":"Attacks"})
fig.show()

Country That Suffered The Maximum And Minimum Attacks¶

In [78]:
max_count = global_terrorism["Year"].max()
max_id    = global_terrorism["Year"].idxmax()
max_name  = global_terrorism["Country"][max_id]
min_count = global_terrorism["Year"].min()
min_id    = global_terrorism["Year"].idxmin()
min_name  = global_terrorism["Country"][min_id]
In [81]:
print(max_name,"Has suffered the maximum number of terror attacks of",max_count)
print(min_name,'Has Suffered The Minimum Number Of Terror Attacks Of', min_count)
Iraq Has suffered the maximum number of terror attacks of 24636
Andorra Has Suffered The Minimum Number Of Terror Attacks Of 1

Insights Derived From The Above Exploratory Data Analysis (EDA)¶

  • There Are Maximum Number Of Attacks In Private Citizens And Property
  • Tabilan And Isil Has Most Active Organization
  • The Middle East And North Africa Regions Has Most Target
  • Iraq Has The Highest Attack
  • Bombing And Explosion Was The Method Mostly Used By The Terrorist Group For Attack
</b>